CM4125 Topic 7 Lab Part 1 (TSD) Guided¶

Load the ether.csv data set into the variable ether

using the first column ('Date') as the index, and parse this column as a date

In [1]:
## Load ether data, considering first column has time
Out[1]:
Open High Low Close Volume
Date
2020-03-13 20:00:00 129.94 131.82 126.87 128.71 14952.18
2020-03-13 19:00:00 119.51 132.02 117.10 129.94 60675.35
2020-03-13 18:00:00 124.47 124.85 115.50 119.51 40771.21
2020-03-13 17:00:00 124.08 127.42 121.63 124.47 22119.04
2020-03-13 16:00:00 124.85 129.51 120.17 124.08 35884.55
... ... ... ... ... ...
2017-07-01 15:00:00 265.74 272.74 265.00 272.57 5581.66
2017-07-01 14:00:00 268.79 269.90 265.00 265.74 6367.05
2017-07-01 13:00:00 274.83 274.93 265.00 268.79 11204.43
2017-07-01 12:00:00 275.01 275.01 271.00 274.83 3023.14
2017-07-01 11:00:00 279.98 279.99 272.10 275.01 2455.28

23674 rows × 5 columns

In Python, you can check that the index column is a data type (i.e. Timestamp), and not a string

In [2]:
## You should do the same in your tool, and ensure the timestamp has been recognised correctly
2020-03-13 20:00:00
<class 'pandas._libs.tslibs.timestamps.Timestamp'>

Sort the dataset by date in ascending order

Q. When did this data set begin?

Answer:

The data set contains the following columns:

Price Data:

  • Open - The price of the commodity (ether) at the start of the time period (minute)
  • High - The highest value the price reached during the time period
  • Low - The lowest value the price reached during the time period
  • Close - The price at the end of the time period

Trade Data:

  • Volume - The total amount of the commodity bought/sold in the time period

Re-sample the data on a daily basis ('1D')

Afterwards, you need to perform aggregations to the columns

Column Aggregation Function
Open The first value
High The maximum
Low The minimum
Close The last value
Volume The sum
In [3]:
## If you are doing this with the purpose of visualising, you can follow a tutorial
## for tableau, you could use this: https://community.tableau.com/s/news/a0A4T000001v7OUUAY/candlestick-charts-in-tableau?cq_cmp=20663578515&cq_net=x&cq_plac=&gad=1&gclid=Cj0KCQjwqP2pBhDMARIsAJQ0CzrWw8RPTBM2JczNFWXiQFoc-3mmgAvJHtSANkHqWV2GxDk6vF8YuKYaAnsoEALw_wcB&gclsrc=aw.ds
## or you can check my example: https://public.tableau.com/views/Candlestick_TSD/Sheet1?:language=en-GB&:display_count=n&:origin=viz_share_link
Out[3]:
Open High Low Close Volume
Date
2017-07-01 279.98 279.99 253.23 263.12 108835.60
2017-07-02 263.12 293.73 255.70 285.00 189772.27
2017-07-03 285.00 285.00 273.07 276.37 125450.81
2017-07-04 276.37 282.83 265.00 269.20 109560.93
2017-07-05 269.20 274.97 252.67 266.27 163719.09
... ... ... ... ... ...
2020-03-09 199.46 208.65 189.85 202.77 457449.62
2020-03-10 202.77 206.28 195.17 200.50 184956.96
2020-03-11 200.50 202.98 181.00 194.61 237607.68
2020-03-12 194.61 195.64 101.22 110.30 1271270.79
2020-03-13 110.30 148.00 90.00 128.71 1191586.11

987 rows × 5 columns

Check that the Close price is the same as the Open price for the next week

Calculate the price half way between the Low and High for each day and store the result in a column named Ether (ETH)

In [4]:
## This is a column calculation as seen in Topic 5 with concepts from this week
Out[4]:
Open High Low Close Volume Ether (ETH)
Date
2017-07-01 279.98 279.99 253.23 263.12 108835.60 266.610
2017-07-02 263.12 293.73 255.70 285.00 189772.27 274.715
2017-07-03 285.00 285.00 273.07 276.37 125450.81 279.035
2017-07-04 276.37 282.83 265.00 269.20 109560.93 273.915
2017-07-05 269.20 274.97 252.67 266.27 163719.09 263.820
... ... ... ... ... ... ...
2020-03-09 199.46 208.65 189.85 202.77 457449.62 199.250
2020-03-10 202.77 206.28 195.17 200.50 184956.96 200.725
2020-03-11 200.50 202.98 181.00 194.61 237607.68 191.990
2020-03-12 194.61 195.64 101.22 110.30 1271270.79 148.430
2020-03-13 110.30 148.00 90.00 128.71 1191586.11 119.000

987 rows × 6 columns

Remove all columns other than Ether (ETH)

Filter the data to include only data from December of 2017

You should have 31 row $\times$ 1 column (plus the Date index)

Round the Ether (ETH) column to 2 decimal places

In [5]:
## Now you need to drop all columns except for Ether (ETH)
## Then, filter based on condition (topic 3)
## Finally, round the numbers using 2 decimal places
Out[5]:
Ether (ETH)
Date
2017-12-01 445.00
2017-12-02 463.08
2017-12-03 466.75
2017-12-04 461.20
2017-12-05 460.78
2017-12-06 443.50
2017-12-07 431.95
2017-12-08 449.56
2017-12-09 492.55
2017-12-10 462.06
2017-12-11 482.62
2017-12-12 607.56
2017-12-13 695.38
2017-12-14 709.42
2017-12-15 662.10
2017-12-16 709.58
2017-12-17 724.96
2017-12-18 746.90
2017-12-19 822.69
2017-12-20 767.30
2017-12-21 806.08
2017-12-22 629.98
2017-12-23 722.50
2017-12-24 688.00
2017-12-25 726.69
2017-12-26 754.54
2017-12-27 736.50
2017-12-28 709.17
2017-12-29 734.00
2017-12-30 720.00
2017-12-31 725.00

Load the data set bitcoin_dec.csv as daily_bitcoin (the first column is the index)

  • https://www.dropbox.com/s/efq5od9q7d4yg1l/bitcoin_dec.csv?raw=1
In [6]:
## Load the bitcoin dataset. again consider 1st column has time
Out[6]:
Bitcoin (BTC)
Date
2017-12-01 10165.01
2017-12-02 10950.00
2017-12-03 11185.08
2017-12-04 11265.50
2017-12-05 11694.96
2017-12-06 12818.08
2017-12-07 15225.00
2017-12-08 15856.32
2017-12-09 15700.48
2017-12-10 14630.75
2017-12-11 16355.55
2017-12-12 16980.36
2017-12-13 16742.08
2017-12-14 16642.59
2017-12-15 17287.56
2017-12-16 18726.43
2017-12-17 19434.50
2017-12-18 18720.23
2017-12-19 17900.00
2017-12-20 16740.68
2017-12-21 16209.95
2017-12-22 13279.48
2017-12-23 14797.50
2017-12-24 14050.00
2017-12-25 14008.00
2017-12-26 15035.50
2017-12-27 15510.50
2017-12-28 14507.40
2017-12-29 14532.30
2017-12-30 13544.60
2017-12-31 13351.10

Join the two data sets to get a comparison of Ether (ETH) and Bitcoin (BTC) prices in December of 2017

Reset the index, so that the index become a named column

In [7]:
## Join based on data (or copy the data next to one another)
Out[7]:
Bitcoin (BTC) Ether (ETH)
Date
2017-12-01 10165.01 445.00
2017-12-02 10950.00 463.08
2017-12-03 11185.08 466.75
2017-12-04 11265.50 461.20
2017-12-05 11694.96 460.78
2017-12-06 12818.08 443.50
2017-12-07 15225.00 431.95
2017-12-08 15856.32 449.56
2017-12-09 15700.48 492.55
2017-12-10 14630.75 462.06
2017-12-11 16355.55 482.62
2017-12-12 16980.36 607.56
2017-12-13 16742.08 695.38
2017-12-14 16642.59 709.42
2017-12-15 17287.56 662.10
2017-12-16 18726.43 709.58
2017-12-17 19434.50 724.96
2017-12-18 18720.23 746.90
2017-12-19 17900.00 822.69
2017-12-20 16740.68 767.30
2017-12-21 16209.95 806.08
2017-12-22 13279.48 629.98
2017-12-23 14797.50 722.50
2017-12-24 14050.00 688.00
2017-12-25 14008.00 726.69
2017-12-26 15035.50 754.54
2017-12-27 15510.50 736.50
2017-12-28 14507.40 709.17
2017-12-29 14532.30 734.00
2017-12-30 13544.60 720.00
2017-12-31 13351.10 725.00

Melt the data set to long format

The entity IDs are in the Date column

Set the var name to 'Commodity' and value name to 'Price (USD)'

You should have 62 rows $\times$ 3 columns

In [8]:
## Melt the dataset with Date as the id, Commodity as name for the variable and Price (USD) as the name for the value
Out[8]:
Date Commodity Price (USD)
0 2017-12-01 Bitcoin (BTC) 10165.01
1 2017-12-02 Bitcoin (BTC) 10950.00
2 2017-12-03 Bitcoin (BTC) 11185.08
3 2017-12-04 Bitcoin (BTC) 11265.50
4 2017-12-05 Bitcoin (BTC) 11694.96
... ... ... ...
57 2017-12-27 Ether (ETH) 736.50
58 2017-12-28 Ether (ETH) 709.17
59 2017-12-29 Ether (ETH) 734.00
60 2017-12-30 Ether (ETH) 720.00
61 2017-12-31 Ether (ETH) 725.00

62 rows × 3 columns

Below you can see a semi-log plot of the price of Bitcoin and Ether in December 2017. Try to reproduce it with your "weapon of choice"!

In [9]:
## Plot using any tool, keeping in mind a log scale will ake it look better!